† Corresponding author. E-mail:
Project supported by the National Natural Science Foundation of China (Grant No. 61402368), Aerospace Support Fund, China (Grant No. 2017-HT-XGD), and Aerospace Science and Technology Innovation Foundation, China (Grant No. 2017 ZD 53047).
The high-frequency components in the traditional multi-scale transform method are approximately sparse, which can represent different information of the details. But in the low-frequency component, the coefficients around the zero value are very few, so we cannot sparsely represent low-frequency image information. The low-frequency component contains the main energy of the image and depicts the profile of the image. Direct fusion of the low-frequency component will not be conducive to obtain highly accurate fusion result. Therefore, this paper presents an infrared and visible image fusion method combining the multi-scale and top-hat transforms. On one hand, the new top-hat-transform can effectively extract the salient features of the low-frequency component. On the other hand, the multi-scale transform can extract high-frequency detailed information in multiple scales and from diverse directions. The combination of the two methods is conducive to the acquisition of more characteristics and more accurate fusion results. Among them, for the low-frequency component, a new type of top-hat transform is used to extract low-frequency features, and then different fusion rules are applied to fuse the low-frequency features and low-frequency background; for high-frequency components, the product of characteristics method is used to integrate the detailed information in high-frequency. Experimental results show that the proposed algorithm can obtain more detailed information and clearer infrared target fusion results than the traditional multi-scale transform methods. Compared with the state-of-the-art fusion methods based on sparse representation, the proposed algorithm is simple and efficacious, and the time consumption is significantly reduced.
The visible image is obtained according to the reflection characteristics of ground features, and can provide rich information of scenes and of spatial details but cannot reveal hidden people or items. The infrared image is captured via the thermal radiation characteristics of an object, and has a high capability of target detection but low spatial resolution. The fusion of infrared and visible images can synthesize better target recognition of infrared images and higher spatial resolution and rich scene information of visible images, thus improving the ability of correctly detecting and identifying targets in complex environments. Such a fusion method has broad applications in military, transportation, security, and many others.[1–4]
Since multi-scale transforms (MSTs in short) have features of multiple resolutions and time-frequency localization, they have always been the hot topics in the field of image fusion for infrared and visible images with strong complementarity.[5,6] The most widely used multi-scale transforms include Laplace pyramid,[7] discrete wavelet transform (DWT),[8–11] dual-tree complex wavelet transform (DTCWT),[12–14] contourlet transform (CT),[15] and non-subsampled contourlet transform (NSCT).[16–18] The fusion process can usually be divided into three steps. Firstly, the source image is multi-scale transformed so as to acquire its low- and high-frequency components. Secondly, fusion coefficients are procured for low- and high-frequency components respectively via specific fusion rules. Finally, fusion coefficients are inversely multi-scale transformed in order to attain the fusion result. Such a multi-scale-transform based fusion method can efficaciously extract source image features in multiple scales and from diverse directions. However, for such a method, fusion rules are mostly designed for high-frequency components. Due to the incompleteness of multi-scale decomposition, certain details are still retained in the low-frequency component, which is even more pronounced when the number of decomposition layers is small. Hence, current fusion rules have not fully taken into account such features as edge contours in the low-frequency component, incurring information loss: the energy and contours contained in the low-frequency component cannot be accurately transmitted to the fusion result, affecting the quality. Therefore, the study herein has focus on the design of fusion rules for the low-frequency component, making up for the limitation of features extraction from multi-scale transforms, and obtaining better fusion results.
In recent years, the top-hat transform in mathematical morphology has been successfully applied to infrared and visible image fusion due to the following reasons. As a toolbox in mathematical morphology, the top-hat transform can simultaneously extract from source images (the infrared and visible images to be fused) the bright features (local maximum values of grayscale) and darkness features (local minimum values of grayscale), and separate bright-&-dark features from background information, which is conducive to obtaining complementary information of source images during the fusion process. With top-hat transform, Bai and co-workers[19] have changed the scale of structural elements, extracted the bright-&-dark features of multiple levels of source images, and obtained rich fusion results. Chen and co-workers[20] have further improved the top-hat transform in Ref. [19]: the bright-&-dark features are fused separately from the background information, thus enhancing the fusion result for infrared and visible images. Such a top-hat transform-based fusion method does the fusion work in the image domain at a single scale. Considering the fact that images usually contain different features at diverse scales and in disparate directions and that such features are often prominent information which the image fusion needs to distinguish and to retain, it is proposed in the study that by combining multi-scale and top-hat transforms and thus taking advantage of both, a better fusion result may be achieved. In particular this means: on one hand for high-frequency components, the multi-scale transform is used to extract detailed information in multiple scales and from diverse directions, and on the other hand for the low-frequency component, the toolbox of top-hat transform in mathematical morphology is utilized to separate the bright-&-dark features from background information, whereby reaping a fusion image with richness and accuracy.
In the following sections, we first introduce the theoretical basis of multi-scale and top-hat transforms in Section
Currently in the field of infrared and visible image fusion, traditional multiscale transforms are capable of extracting source image features in multiple scales and from diverse directions, which is beneficial to obtain rich fusion results, thus becoming an active research direction. The most typical and widely used multi-scale transforms are DWT, DTCWT, and NSCT.
DWT is the most representative one in wavelet transforms and enjoys many edges such as directionality and localization. For the source image, a DWT can be performed to obtain the low-frequency component denoted as LL and three high-frequency components denoted as LH, HL, and HH, respectively for details of horizontal, vertical, and diagonal directions. DTCWT overcomes drawbacks of traditional wavelet transform: less directional selectivity and without translation invariance, by adopting the two-way DWT structure in a binary tree: one tree generating the real part of the DTCWT while another tree begeting its imaginary part, thus obtaining the approximate translation invariance, directional selectivity, and computational efficiency. NSCT is a true two-dimensional image transform and can acquire the essential geometric structure of the source image, and has no need for the downsampling and upsampling processes in the decomposition and reconstruction of the image while being translation invariance. NSCT consists of non-subsampled pyramid filter banks (NSPFB in short) and non-subsampled directional filter banks (NSDFB in short). The size of submap in each direction is equal to that of the source image.
From the above analysis, it is known that multi-scale transforms with better performance have been proposed, and the capability of feature extraction has been continuously enhanced, thus effectively advancing the image fusion technology. However, most existing multi-scale-based fusion methods focus on the research of high-frequency fusion rules, which is in contrast with the fact that for the low-frequency component the fusion rule is rather standard, single, and unique. Since the low-frequency component still contains much edge and contour information, and such important information in low-frequency component cannot be retained, and the fusion result is affected. Due to such consideration, the study herein aims at addressing this issue and focuses on fusion rules for multi-scale transform of low-frequency component in order to improve its accuracy and to enhance fusion effect.
Top-hat transform is an important tool in mathematical morphology, and in recent years has been successfully applied to the fusion of infrared and visible images, and achieved good fusion effects. The classical top-hat transform can only extract the bright features of the image, which correspond to the local pixel peak values of the source image, and the darkness features correspond to the local pixel valley values. In order to extract at the same time bright-&-dark features of the source image, multiple subtraction operations are entailed, and due to calculation errors are prone to the problem of inaccuracy and incompleteness of image feature extraction, and are inconvenient for practical applications. Hence, one new type of top-hat transform is utilized: when the open and close operations are performed, two structural elements
Such an operation can eliminate both bright-&-dark features at the same time. The new top-hat transform is composed of the source image and subtraction of the new open operation, therefore, the bright-&-dark features of the image can be obtained simultaneously. Furthermore, the performance of the bright-&-dark features extracted by the new top-hat transform is consistent with the source image, viz. the bright features correspond to the local pixel peak values and the dark ones to valley numbers. Thus, the study herein adopts the new type of top-hat transform to extract image feature, which can simultaneously obtain not only bright-&-dark features but also avoid multiple subtraction operations, whereby improving the accuracy of feature extraction. Although one of the structural elements has a large scale, its computational complexity and load are comparatively and significantly reduced relative to the traditional top-hat transform. Definition of the new top-hat transform is as follows:
In Fig.
For the infrared and visible image fusion, the source image is first multi-scale transformed to extract components of low-high-frequencies, and such information represents disparate image features: the components of high-frequency correspond to features with sharp variations of grayscale in the source image such as edges, textures, and corners. Coefficients of high-frequency components are thus far approximately sparse (with relatively fewer non-zero elements) and can comparatively better depict characteristic features. Therefore, taking into account the characteristic features of single pixels and local regions, the product of characteristics method is adopted as the fusion rule. The low-frequency component corresponds to features with slow variations of grayscale in the source image such as contours and energy. Subject to the scale and to the selective direction in the multi-scale transform, the low-frequency component usually contains many important features of the source image, and if dealt without any discretion, the fusion result will be affected adversely. Therefore, the study herein has utilized the new top-hat transform to first isolate the bright-&-dark features in the low-frequency component from its background, and afterwards nonidentical fusion strategies are employed to fuse separately with anticipation to rich features and to accuracy in the low-frequency component of the resultant fusion image. As stated above, the current study combines the classical multi-scale transform with the new top-hat one, and takes advantages of both in order to obtain superior fusion results. Figure
Due to the incompleteness of multi-scale decomposition, certain information on details still retains in the low-frequency component, which is more pronounced for the case with relatively fewer decomposition levels. Traditional multi-scale transforms usually use weighted average to fuse the low-frequency component, neglecting its detailed features, thus losing the information on details of its source image and decreasing the contrast in the resultant fusion image. Hence, for the low-frequency component, this study adopts a fusion rule which is based upon the new top-hat transform: processing the bright-&-dark features separately from its background by utilizing the complementary and redundant information in the source image, thus enriching the fusion effect with more details and energy being more evenly distributed.
Firstly, perform the new open operation on the low-frequency component Iilow (i = 1, 2), where B1 and B2 are square flat structural elements, and the scale of B1 is greater than that of B2. Obtain the background IiG (i = 1, 2) for the low-frequency:
Next, apply the new top-hat transform, viz. the low-frequency component Iilow (i = 1, 2) being subtracted by its background IiG (i = 1, 2) to obtain the bright-&-dark features IiD (i = 1, 2):
Figure
The bright-&-dark features and background of the low-frequency component are first divided into small blocks by a sliding window strategy and then fused. Because region-based fusion rules are better than single-pixel fusion rules, the block size is n × n, the sliding step size is step, obtain the background IiGblocks (i = 1, 2) and the bright-&-dark features IiDblocks (i = 1, 2): with both being to be fused.
The bright-&-dark features IiDblocks (i = 1, 2) in the low-frequency component are relatively sparse coefficients. The rule of taking the absolute value with the largest magnitude can retain more details in the low-frequency component, rendering richer detailed information in the fusion result. IfDblocks (i = 1, 2) are the bright-&-dark features in the fused image, and the fusion rule is given as
The background of the low-frequency component IiGblocks (i = 1, 2) needs to provide reasonable background information for the fused result. Therefore, guided by the bright-&-dark features in the low-frequency component, a weighted-average fusion rule is applied, which can maximally retain such features as contours in the low frequency and is conducive to actualize relatively sharper contrast in the fusion result. IfGblocks represents the fused background and the weighted average is shown as
Finally, add the fused bright-&-dark features IfGblocks with the fused background IfDblocks, perform the reverse sliding window operation and put back sequentially, and take the average value per the number of superimposition of each pixel to land the fused low-frequency component Iflow.
For high-frequency components, traditional fusion rules are all based upon one single characteristic, which is accomplished by calculating the local variance, local gradient, energy of the wavelet coefficients, etc. The biggest disadvantage of such rule is the incompleteness of feature analysis and the insufficiency of extracted image details considering only the individual information of a single pixel while neglecting the overall information of the local region, and vice versa. The study in Ref. [24] has applied the idea of multi-resolution in wavelet analysis to coefficient fusion, adopted the product of multiple characteristics as the fusion rule, and obtained the fusion effect which is more natural and more comprehensive. The study herein has used the same two characteristics in Ref. [24] as the decision basis for coefficient fusion. In the window l with size N × N, define the product of characteristics
The above describes fusion rules for the high and low frequency components specifically. On one hand, the new top-hat transform has been applied to low frequency components and thus separate the bright-&-dark features of the low-frequency component from its background, and consequently different fusion rules are selected with the help of different feature attributes. On the other hand, for high-frequency component, the fusion rule is based upon the feature product in order to extract the accuracy information from the high frequency component. For infrared and visible image fusion, such a fusion strategy has enabled the fusion effect with objects being clear and features being rich.
The experimental data are three sets of infrared and visible images which have been accurately registered: the UNcamp group, the duck group, and the street group, which are shown in Fig.
In the experimental design, all parameters for the TH1 and TH2 methods are taken exactly as the optimal ones in Refs. [19] and [20], respectively. For the SR, JSR1, and JSR2 methods, the training sample consists of the randomly-selected (from the source image) 5000 image blocks with size of 8 × 8. The number of iterations for the K-SVD algorithm is 20. For the OMP algorithm, the decomposition error is set to 0.1. The fusion rule for the SR method is taking the largest norm of l1 and the weighted average for the JSR1 method. For the JSR2 method, the parameters for the low-frequency component are the same as the optimal ones in Ref. [23]. Three layers of “db1” wavelet basis have been applied for the DWT and DWT+TH methods. For the DTCWT and DTCWT+TH methods, the first stage filter is “FSfarras” and “dualfilt1” for other stage filters. For the NSCT and NSCT+TH methods, the “pyrexc” filter is adopted and the “vk” as the directional filter, and the number of decomposition is four and the directional series sequentially take the values of [1,2,3,4]. In order to verify the role of the top-hat transform in fusing the low-frequency component, for the MST method, the simplest average fusion rule is utilized, viz. taking the mean of coefficients. For high-frequency components, the fusion rule for the MST and MST+TH methods is taking the largest of the products of characteristics.
As mentioned earlier for the new-top hat transform, there exist structural elements B1 and B2 with scales
Figure
The first set of infrared and visible images is called the UNcamp group with size of 240 × 320, and its experimental results are shown in Fig.
The second set of infrared and visible images has the size of 288 × 208, and is called the duck ground. Experimental results are shown in Fig.
The third set of experimental images is called the street group. The infrared and visible source images have the size of 632 × 496, and the experimental results are shown in Fig.
As can be seen from Table
For the fusion of infrared and visible images, a novel method has been proposed to combine the multi-scale and top-hat transforms. Firstly, the source image is multi-scale transformed to obtain its low-frequency component and several high-frequency components. Next, different fusion rules are applied to integrate low- and high-frequency components, respectively. With the multi-scale transform, due to its limited capability of feature extraction, there still exist many important features in the low-frequency component, thus, the top-hat transform in mathematical morphology is applied to further extract background information and bright-&-dark features, and the latter is being fused via the rule of taking the maximum of the absolute value. Apply the rule of weighted factors to fuse the low-frequency background. For the low-frequency component, such means of handling separately the bright-&-dark features and the background information has effectively retained its many important features. For high-frequency components, taking into consideration of the single pixel and local regions, the product of characteristics is set as the fusion rule. Finally, the inverse multi-scale transform is applied to the high-&-low frequency components to yield the resultant fusion image. Experimental results have shown that the proposed method herein is superior to the classical and state-of-the-art methods. (i) Compared to MST methods, such as the DWT, the DTCWT, and the NSCT methods, separate treatment of features and background in the low-frequency component is helpful in preserving important features and for improving fusion performance. (ii) The proposed method is better than the one which is based singly upon the new top-hat transform, viz. the TH. in the sense that it can effectively overcome the drawback of incompleteness of feature extraction, which is inherent in top-hat transform. (iii) It is also much simpler and more efficient than those methods which are based upon sparse representation.